eQTL Studies: from Bulk Tissues to Single Cells

An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.


Introduction
Recent years have seen great progress of identifying genomic regions associated with complex traits and diseases through genome wide association studies (GWAS), where tens of thousands of genomic regions have been associated thousands of traits, including many complex diseases (see the curated GWAS results at https://www.ebi.ac.uk/gwas/). One challenge of interpreting GWAS findings is that the majority of associated genetic variants, e.g. single nucleotide polymorphisms (SNPs), are in intergenic regions, making it difficult to infer functional genes and variants in these regions. Many efforts have been made to annotate the human genome through experimental (e.g. ENCODE project (CONSORTIUM 2012;CONSORTIUM et al. 2020); Roadmap Epigenomics Project (ROADMAP EPIGENOMICS et al. 2015); psychENCODE (ZHU et al. 2018)) and computational approaches (e.g. CADD (KIRCHER et al. 2014 it was shown that SNPs associated with complex traits are significantly more likely to be eQTLs identified than minor-allele-frequency-matched SNPs (NICOLAE et al. 2010). In another study (KINDT et al. 2013) assessed enrichment and depletion of variants in different annotation classes, including genic regions, regulatory features, measures of conservation, and patterns of histone modifications. It was found that annotations associated with chromatin state together with eQTLs were the most enriched group. These early results stimulated many large community efforts to collect gene expression and genotype data for eQTL studies, and the accumulation of eQTL results parallels the great success of GWAS. If a SNP is associated with a complex trait, and is also associated with the expression level of a specific gene, this gene may be implicated as a possible candidate gene for the disease. A number of methods have been developed to formalize this idea for co-localization analysis that aims to find the SNPs that are associated with both expression and complex traits (HORMOZDIARI et al. 2016;WEN et al. 2016;GIAMBARTOLOMEI et al. 2018). Transcriptome wide association analysis methods have been developed to use eQTL data to predict the expression levels and associate the predicted (imputed) expression levels with the observed complex traits (GAMAZON et al. 2015;GUSEV et al. 2016;HU et al. 2019). Mendelian randomization methods have also been proposed to investigate whether the expression trait is a causal factor for a complex trait of interest (RICHARDSON et al. 2020;YUAN et al. 2020;ZHOU et al. 2020;LIU et al. 2021b).
The most well-known eQTL study is the GTEx project where dozens of tissues from hundreds of individuals were analyzed to identify tissue specific eQTLs (CONSORTIUM 2020). The GTEx project has proved to be an extremely valuable resource for the research community. The version 8 of the GTEx analyzed 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. It was found that cis-eQTLs showed 1.46-fold enrichment in the GWAS catalogue (https://www.ebi.ac.uk/gwas/) where significant GWAS association results are collected. The cross tissue eQTL similarities were consistent with tissue relatedness, with testis, lymphoblastoid cell lines, whole blood, and liver distinct from other tissues, tissues from the brain region forming one cluster, and other organs being more similar to each other.
BLUEPRINT collects genetic, epigenetic, and transcriptomic profiling in three immune cell types to investigate the contributions of different factors in gene expression (CHEN et al. 2016). eQTL catalogue is a resource developed by reprocessing data from dozens of studies with more than 30,000 samples in total, where summary statistics are available for many cell types and tissues (KERIMOV et al. 2021). The results from these studies and resources thus generated have demonstrated the values of eQTL information in inferring causal genes and variants at GWAS loci.
Most eQTL studies to date have been performed on bulk samples, where the estimated effect size of a SNP represents the average effects across different cell types, and the cell-type origin(origins) of the inferred eQTLs is(are) often unknown for a bulk sample consisting of distinct cell types. Despite some successes of using eQTL results to infer disease causing genes and variants, recent studies based on both modeling (YAO et al. 2020) and carefully chosen gene-trait pairs (CONNALLY et al. 2022) have shown that the known eQTLs, which are mostly derived from analysis of bulk tissues, only explain a very small proportion of the GWAS signals, where only a small proportion of the GWAS hits colocalize with eQTL SNPs. There is a growing evidence (as summarized below) that eQTL effects are often cell-type-specific and/or context-dependent, and many of the eQTLs uniquely identified through cell-type-specific and context-dependent analysis (either experimentally or computationally) colocalize with GWAS results (AGUIRRE-GAMBOA et al. 2020;DONOVAN et al. 2020;KIM-HELLMUTH et al. 2020;PATEL et al. 2021), suggesting the importance of cell-type-specific and context-dependent eQTLs for interpreting and understanding GWAS signals. Therefore, there is a great need to identify these additional eQTLs missed from tissue-based analysis to expand the space of eQTLs and make more informed decisions on disease-causing genes and variants.
To facilitate the identifications of cell-type-specific and context-dependent eQTLs, statistical methods have been developed for both bulk samples through digital deconvolution analysis, and for single cell data, which offer finer cell type resolutions and can capture dynamic effects of eQTLs. In this review, we first discuss existing statistical methods that use bulk tissues and single cell data to identify cell-type-specific and context-dependent eQTLs. We then summarize results from empirical studies using bulk samples, purified cells, and single cell data. We conclude with the limitations of the existing computational methods and future methodological needs.

eQTL inference using bulk samples
Early eQTL studies collected gene expression data using microarrays, where gene expression levels need to be normalized to remove batch effects and the normalized data are analyzed to identify eQTLs. Consider a study with N subjects, S SNPs, and G genes. For the nth subject, let yng denote the expression level of the gth gene, and xns denote the genotype of the sth SNP. For a SNP with two alleles, say A and a, its three genotypes AA, Aa, and aa can be coded as 2, 1, and 0, respectively. The relationship between the observed gene expression level yng and genotype xng can be studied through the following regression model where # is the intercept, #' is the effect of the sth SNP on the expression of the gth gene, and "#' is the error term, often assumed to follow a normal distribution. A more comprehensive model may also include other covariates, such as age and sex. Testing the null hypothesis that the sth SNP does not have any effect on the expression level of the gth gene is equivalent to testing #' = 0. A typical eQTL study considers more than 20,000 genes and up to millions of SNPs. Because of the large number of SNPs to be tested, researchers often focus on cis-eQTLs for a given gene, which are SNPs in close physical proximity of the gene, say within one million base pairs of the candidate gene. In contrast, trans-sQTLs correspond to SNPs that are on different chromosomes or further away from the gene of interest on the same chromosome.
Most eQTL findings have been for cis-eQTLs largely due to statistical power difference in detecting cis-eQTLs and trans-eQTLs. With a few hundred samples, which is the typical size of an eQTL study, there is limited power to do a genome wide association study required to identify trans-eQTLs, which often have smaller effect sizes than cis-eQTLs.  . It was found that considering allelic specific expression could identify 20% to 100% more genes with eQTLs across 28 tissues in the GTEx project than only considering total expression levels using TreCASE (ZHABOTYNSKY et al. 2022), and the power gain of mixQTL was equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage ).
The analysis of tissue level data also allows for the investigation of context-dependent eQTLs if the context can be well defined. For example, 369 sex-biased eQTLs were inferred through separate analyses of male and female GTEx samples (CONSORTIUM 2020), where the sex of an individual may be considered a context. Furthermore, 178 population-biased eQTLs were also implicated, where population origin may be considered another context. Other contextdependent effects can be considered by including an interaction term between the context variable and the SNP genotype in regression model (1).

Cell-type-specific eQTL inference using bulk samples
A number of studies have been published that use purified cells of different cell types to infer cell-type-specific eQTLs (ct-eQTLs). As the gene expression data in these samples are collected in the same manner as bulk tissue samples, the same statistical methods for bulk samples can be applied to infer ct-eQTLs from these data, although the sample size tends to be smaller and the measurement noises may be higher. In addition, the purified cell types may be contaminated with other cell types.
Without collecting data from purified cell types, Westra et al. (WESTRA et al. 2015) proposed to identify ct-eQTLs by investigating whether there is an interaction effect between the surrogate score for a cell type and candidate SNP's genotype on bulk gene expression levels from the collected samples. More formally, this model can be written as with two additional terms #-" and #,'-( "' × " ) compared to model (1), where mn is a proxy marker for the cell type of interest in the nth individual, #-is the effect of the proxy marker on the expression level of the gth gene, and #,'-is the interaction effect between genotype "' and proxy marker m. A significant interaction effect, i.e. #,'-≠ 0, is interpreted as the cell-type-specific effect of SNP s on expression of gene g. The same approach was used to study context-dependent eQTLs in (ZHERNAKOVA et al. 2017).
Instead of deriving cell-type-specific proxy markers or enrichment scores, the estimated cell type proportions can also be used as a proxy for a given cell type. Recent years have seen the developments of many methods to deconvolute bulk RNA-seq samples to infer proportions of different cell types and cell-type-specific expression levels (AVILA COBOS et al. 2020). For the nth subject, let "4 denote the estimated proportion of the kth cell type for this individual, where there is a total of K cell types. We can use the following regression model to detect cs-eQTLs for the kth cell type.
(3) In this model, #4 is the cell type proportion effect from the kth cell type, and #,'4 is the interaction effect between the sth SNP and the proportion of kth cell type. A non-zero #,'4 suggests a cell-type-specific effect for the sth SNP.
The formulations in (2) and (3) consider one cell type at a time and ignore the contributions of possible cell-type-specific effects from other cell types, both in terms of proportions and the expression profiles, leading to a potential loss of information. Moreover, because models (2) and (3) only consider a tissue and cell type pair at a time, and may not attribute a non-zero #,'4 to the correct cell type. For example, consider the case where there are two cell types, where k = 1 or 2. If #,'5 > 0, then #,'7 < 0 due to the constraint that "5 + "7 = 1. Furthermore, the power differs across cell types, with a higher statistical power in detecting ct-eQTLs for more abundant cell types. A more comprehensive model that takes into account all cell types simultaneously can be formulated as subject to the constraint that ∑ "4 = 1 4 . Correspondingly, the sth SNP is a ct-eQTL for the kth cell type if #,'4 ≠ 0. Another way to parametrize this model is in the form of Note that the #4 + #,'4 × "4 term in model (5)  In practice, ct-eQTL analysis based on the above models often use transformed gene expression data instead of read counts and this may distort the association between the observed gene expression level with cell type compositions, leading to reduced power and inflated false positives. Recently, Little and colleagues proposed CSeQTL (Cell type-Specific eQTL) to jointly model total read counts and allele specific counts by a negative binomial (or Poisson) and a beta-binomial (or binomial) distribution with the consideration of covariates, cell type composition, and SNP genotype (LITTLE et al. 2022). CSeQTL also includes allele-specific expression to further increase the power to detect cell type-specific eQTLs. Empirical studies showed higher power of CSeQTL than linear model-based methods. In comparison, DeCAF is a linear model-based method that considers both total expression levels and allele-specific expression (KALITA AND GUSEV 2022).
Although the above approaches are intuitive, there are challenges to apply them to infer ct-eQTLs in practice. First, there are uncertainties in the estimated proxy markers and cell type proportions, and these need to be appropriately incorporated in the analysis. However, this issue has only recently been studied (CAI et al. 2022; XIE AND WANG 2022) and the impact of incorporating these uncertainty estimates in ct-eQTL inference needs to be studied. Second, although ct-eQTLs may be inferred for all cell types in principle, it would be relatively easier for more abundant cell types than for less abundant or rare cell types. Third, there has to be sufficient variations in cell type compositions across subjects to allow ct-eQTL inference. For example, in the extreme case that all the subjects have identical cell type proportions, the parameters in the above models are not identifiable. Fourth, the above formulation does not take into account similarity among some cell types, although methods have been proposed to consider cell lineage (YANKOVITZ et al. 2021).

Cell-type-specific eQTLs from single cell data
In addition to bulk data, single-cell data are increasingly used for ct-eQTL inference (JONKERS AND WIJMENGA 2017; VAN DER WIJST et al. 2018;LIU et al. 2021a;NEAVIN et al. 2021). Most . SURGE approximates the posterior distribution of all latent variables using mean-field variational inference. Similar to CellRegMap, only eQTLs identified in previous studies were used for analysis due to statistical power concerns, and the reliance on the normal distribution assumption of the error terms limits its direct application to single cell data. As a result, single cells have to be aggregated to meta cells before the model is applied.

Empirical results on cell-type-specific and context-dependent eQTL analyses
In this section, we review the growing evidence of cell-type-specific and context-dependent eQTLs using bulk samples, purified cells, and single cells.

Whole blood
Among the first analysis of cs-eQTLs using bulk samples, (WESTRA et al. 2015) analyzed whole blood gene expression data of 5,683 individuals from seven cohorts to infer cell-type-specific cis-eQTLs. A total of 1,115 cis-eQTLs (8.5% of the significant cis-eQTLs from prior eQTL analysis for the whole tissue) were found to have significant interaction effects with neutrophil proxy. The results were replicated in six individual purified cell-type eQTL datasets. More importantly, the authors showed SNPs associated with Crohn's disease preferentially affect gene expression within neutrophils, demonstrating the insights gained from cell-type-specific eQTL analysis. (ZHERNAKOVA et al. 2017) performed eQTL and context-dependent eQTL analysis on RNA-seq data of peripheral blood from 2,116 unrelated individuals, identifying 23,060 genes with eQTLs, among which 2,743 (12%) showed context-dependent effects.

GTEx data
Cell-type-specific analysis was performed on the GTEx data in (KIM-HELLMUTH et al. 2020). The authors estimated cell type enrichment for seven cell types (adipocytes, epithelial cells, hepatocytes, keratinocytes, myocytes, neurons, and neutrophils) across 35 tissues. Between 43 pairs of tissues and cell types, they identified eQTLs specific to at least one cell type by testing for interaction effects between SNP and cell type enrichment on the observed expression levels.
They found that these cell-type-interaction QTLs, called ieQTLs, are enriched for genes with tissue specific eQTLs and generally not shared across unrelated tissues. Furthermore, these ieQTLs are enriched for complex trait associations and had colocalization signals for hundreds of loci that were undetected in bulk tissue. (AYGUN et al. 2021) used a cell-type-specific in vitro model system including 85 neural progenitors and 74 virally labeled and sorted neuronal progeny for eQTL analysis. They identified 2,079 and 872 eQTLs in progenitors and neurons, respectively, with 66% and 47% of these eQTLs not identified in fetal bulk brain eQTLs from a largely overlapping sample or in adult data from GTEx. These eQTLs had cell-type-specific colocalizations with GWAS hits for neuropsychiatric disorders and other brain-related traits.

Brain cells
Microglia in the brain play critical roles in immune defense and development, and are implicated in neurodegenerative disorders. (YOUNG et al. 2021) gathered gene expression profiles in primary microglia isolated from 141 patients undergoing neurosurgery. A total of 585 microglia eQTLs were identified. Through joint analysis with monocytes and IPSDMac, 855 microglia eQTLs were inferred, with 108 microglia specific, and 449 shared across three cell types. For colocalization with GWAS hits, there was an excess of colocalized microglial eQTLs for Alzheimer's disease, Parkinson's disease, and inflammatory bowel disease.

Melanocyte cultures
Because melanocytes give rise to melanoma but account for less than 5% of human skin biopsies,  performed eQTL analysis in primary melanocyte cultures from 106 newborn males to identify eQTLs in melanocytes. The identified melanocyte eQTLs differed considerably from those from the GTEx tissues, including skin. Novel risk genes for melanoma were implicated using the transcriptome wide association study based on this data set.

Immune cells
In the DICE project, 13 immune cell types were isolated from 106 leukapheresis samples of 91 healthy subjects (SCHMIEDEL et al. 2018). It was found that eQTLs are highly cell type specific, and sex has a major effect on gene expression. In the ImmuNexUT study, with samples from 79 healthy controls and 337 patients diagnosed with different immune-mediated diseases, (OTA et al. 2021) purified 28 immune cell types from these individuals with a total of 9,852 samples, and performed cell-type-specific eQTL analysis. They identified a median of 7,092 genes with eQTLs in each cell type, 2.2-fold more than that identified in the DICE study (SCHMIEDEL et al. 2018). They further identified eQTLs that were only present in patients.

PBMC
In a proof of concept study, (VAN DER WIJST et al. 2018) analyzed 25,000 single cell RNA-seq data from 45 donors. In total, they identified 379 unique cis-eQTLs involving 287 unique eGenes across six cell types. A total of 48 cis-eQTLs were only identified from cell-type-specific analysis. The authors also demonstrated the benefit of performing cell subtype analysis for cMonocytes and ncMonocytes.
In (OELEN et al. 2022), the authors exposed PBMC samples from 120 individuals to three pathogens and sequenced these samples in an unstimulated condition and after 3 hours and 24 hours in vitro stimulation for the three pathogens. They identified cell-type-specific eQTLs, with the number of such eQTLs correlated with the cell type abundance. Furthermore, the effects of eQTLs differed across pathogen stimulations, the strongest enrichment for GWAS signals was observed for eQTLs that were identified from stimulation experiments.
The investigators of the OneK1K cohort analyzed 1.27 million PBMC single cell RNA-seq data from 982 donors of Northern European ancestry and performed eQTL analyses on 14 immune cell types (YAZAR et al. 2022). A total of 26,597 cis-eQTLs were identified, with most having celltype-specific effects. Dynamic effects were also observed based on pseudo-time trajectory for the B cell landscape. In addition to cis-eQTLs, 990 trans-eQTLs were identified, with most genes regulated by trans-eQTLs being specific for a single cell type, and none were ubiquitous across cell types. Co-localization analysis between eQTLs and GWAS signals suggested that 60% of colocalizing genes were detected upon activation and co-localization is very cell typespecific. (NEAVIN et al. 2021) gathered single cell RNA-seq data from 64,018 fibroblasts of 79 donors and performed single cell eQTL analysis. For the six types of fibroblasts and four types of iPSCs, the majority of detected eQTLs in fibroblasts were specific to one cell type. Only 41% of the 45,503 eQTLs identified in the six fibroblast types were significant in GTEx, Using 125 iPSC lines derived from 125 donors, (CUOMO et al. 2020) collected single cell gene expression data from 36,044 cells at four differentiation time points using full-length RNAsequencing as well as the expression levels of selected cell surface markers through cell sorting. Substantial regulatory changes were observed with over 30% of eQTLs being specific to a single stage. Hundreds of eQTLs at the mesendo and defendo stages were new. This study also tested for associations between pseudo-time and the genetic effect size using a linear model, and identified 899 time-dynamic eQTLs. With 89 healthy donors, (SCHMIEDEL et al. 2022) performed eQTL analysis for more than one million activated CD4+ T cells classified into 19 distinct CD4+ T cell subsets. The effects of many eQTLs were strongly manifested only in certain cell types in an activation-dependent manner, and significant sex effects were also observed. (NATHAN et al. 2022) performed single cell eQTL analysis using gene expression data from more than 500,000 unstimulated memory T cells from 259 Peruvian individuals. They found that the effects of one-third of cis-eQTLs were mediated by continuous multimodally defined cell states, with independent eQTLs at some loci having opposing cell-state relationships.

Brain cortex
In a recent study using single cell data from prefrontal cortex, temporal cortex and deep white matter from 192 individuals, a total of 7,607 genes were found to have eQTLs from eight cell types (BRYOIS et al. 2022). A majority of cell-type-specific eQTLs were replicated in tissue-level eQTLs for cortical tissue, with eQTLs for more abundant cell types more likely replicated in the tissue results. It was also found that the effect sizes estimated from tissue tend to be lower than those estimated from cell-type-specific analysis. As expected, the number of cis-eQTLs identified in a cell type was strongly correlated with the number of cells available for the corresponding cell type. The effect sizes were more similar for similar cell types with microglia being most different from other cell types. Co-localization analysis suggests that disease risk at a given GWAS locus is usually mediated by a single gene acting in a specific cell type.

Dopaminergic neuron differentiation
In (JERBER et al. 2021), the authors differentiated 215 human induced pluripotent stem cell (iPSC) lines to profile over 1 million cells across four conditions, including three differentiation stages (progenitor-like, young neurons and more mature neurons) and cells exposed to a chemical stressor. eQTL analysis was performed for 14 cell types, identifying 4,828 genes with eQTLs. Compared to eQTLs identified from GTEx brain tissues, this study identified 2,366 new eQTLs. As for colocalization analysis, 1,284 eQTLs were colocalized, with 597 being new, and 67% of these new colocalizations were associated with eQTLs detected in later differentiation stages or upon stimulation. A colocalization using aggregated data from different cell types yielded a much smaller number of colocalizations, suggesting the importance of considering cell type specificity.

Discussion
A main driving force for many eQTL studies in recent years is their potential to offer insights on the GWAS signals which mostly fall into non-coding regions of the human genome. Because of the importance of cell-type-specific and context-dependent eQTLs, there is a growing number of studies collecting and analyzing data to facilitate such analyses. Coupled with the rise of rich data that offer cell-type-specific and context-dependent gene regulation information, there is also a need for more statistically robust and computationally efficient methods for these analyses. In this paper, we have reviewed statistical methods that have been developed and applied to analyze different types of data for cell-type-specific and context-dependent eQTL inference, including bulk samples, purified cells, and single cells.
Despite these progresses, many issues remain to be resolved, especially in anticipation of the many population-level single cell data to be gathered in the near future that will involve tens of thousands of individuals and tens of millions of single cells across different tissues. For example, to fully respect the nature of the single cell data, the single data are more appropriately modeled as Poisson or negative binomial distribution. Yet, most published studies have adopted linear regression models and often aggregated multiple cells to address the sparsity of the observed single cell data. For bulk samples using RNA sequencing, efforts have been made to use allele-specific expression to improve statistical power for identifying eQTLs, and limited work has been done for single cell data to capitalize on allele-specific expression information. There are challenges in appropriately defining cell types and contexts for cell-typespecific and context-dependent eQTL analysis. The work of (CUOMO et al. 2022) and(STROBER et al. 2022) represent some initial efforts and more needs to be done to fully capture and utilize the cell state information for eQTL discoveries, including non-linear transcription programs (WANG AND ZHAO 2022). In addition, as more than one SNP may jointly affect expression levels (CONSORTIUM 2020;ABELL et al. 2022), methods that include joint effects are likely to be more powerful and can better characterize the relationship between gene expression levels and

SNPs.
With many studies performed on bulk tissues, and the availability of a number of methods to use bulk tissue samples for cell-type-specific eQTL analysis, there is a need to effectively integrate the results from single cell data and bulk tissue data. Furthermore, as different tissues may share cells of similar cell types and states, there is also a need to better integrate results across tissues and studies (FLUTRE et al. 2013;URBUT et al. 2019).
As for all genetic studies, study design is important, such as the number of samples to be collected and, in the case of single cells, the number of cells per subject and the sequencing depth of each cell. A number of software and tools have been developed to facilitate this analysis under relatively simple statistical models for data analysis (MANDRIC et al. 2020;DONG et al. 2021;SCHMID et al. 2021). Further developments are needed to incorporate more comprehensive statistical models for analysis, and the consideration of other information, e.g.
alleles-specific expression, in inferring eQTLs. Due to the limited statistical power, most studies to date have focused on cis-eQTLs. For example, fewer than 150 genes were found to be affected by trans-eQTLs in the GTEx project (CONSORTIUM 2020). There is a critical need to design statistical methods to identify trans-eQTLs, both for bulk tissues and for cell-type-specific and context-dependent effects.
We have focused on the inference of eQTLs in this paper. There are many downstream applications of eQTL analysis. For example, there is need for methods that perform colocalization analysis with cell-type-specific and context-dependent eQTL results without focusing on a specific set of SNPs through simple thresholding of statistical significance, while accounting for the linkage disequilibrium information across the SNPs in a region. Cell-typespecific and context-dependent eQTLs also offer the opportunity to improve the identifications of candidate genes through transcriptome wide association studies at the cell-type and contextdependent level, and the recently developed methods (SONG et al. 2023;OKAMOTO et al. 2023) for bulk samples can be extended for cell-type-specific and context-dependent analyses. There is also the need for Mendelian randomization methods to infer the causal relationship between transcript levels and complex traits and diseases (RICHARDSON et al. 2020;YUAN et al. 2020;ZHOU et al. 2020;LIU et al. 2021b). In this setting, the existing methods for bulk samples may not be adequate to deal with the count nature of the single cell data, data sparsity, and the need to integrate information from multiple data sources for more informed analysis and decision. In addition to eQTLs that affect gene expression levels, genetic variants can also affect gene expression variances and co-variances (HULSE AND CAI 2013;EK et al. 2018;SARKAR et al. 2019;MARDERSTEIN et al. 2021). Compared to eQTL analysis, relatively little has been done to identify such variants. Even less has been explored at the cell type level, where cell-typespecific co-expressions may be inferred using either bulk samples (SU et al. 2022b) or single cell data (SU et al. 2022a). A comprehensive catalog of eQTLs that have cell-type-specific and context-dependent effects on gene expression variance and co-variance will better characterize genetic regulation of expressions and interpret GWAS results.
Although gene expression has been the focus of cell-type-specific and context-dependent analysis, other data types are being increasingly collected for similar analysis, such as methylation, chromatin accessibility, and proteomics, based on single cell data, e.g. . The methods and tools developed for bulk and single cell RNA-seq data may also be applicable to other data types, such as recently published methylation data from the GTEx subjects (OLIVA et al. 2023) and an meQTL dataset derived from primary melanocytes of 106 individuals ). More importantly, as different data types reflect different aspects of the same biological process, there is a need to integrate data from different modality to assess the genetic effects of SNPs on gene expression, methylation, chromatin accessibility, protein expression, and other molecular phenotypes. Such integrated analysis will likely yield more informative annotations of the SNPs to facilitate the interpretation of the GWAS results (HORMOZDIARI et al. 2018). Figure 1. Illustration of eQTL analysis at different resolutions: single cells, purified cells, and bulk samples. Shown are data from three individuals with genotypes of AA, AG, and GG, respectively. There are two cell types making up the bulk samples, the oval shaped cells and the triangle shaped cells. For single cell data, we can observe expression level at the single cell level. For example, for the first individual with genotype AA, there are four oval shaped cells with expression level at 0.9, 1.1, 0.8, and 1.2, and two triangle shaped cells with expression level at 3.2 and 2.8, respectively. eQTL analysis can be performed for two cell types separately using single cells across these three individuals to correlate genotypes with observed single-cell level gene expression data. For data from purified cells, we observe aggregated gene expression levels for different cell types but without individual cell level measurements. The average expression level for the oval shaped cells is 1, 2, and 3, respectively, for the three individuals. For data from bulk samples, we can no longer distinguish contributions from two distinct cell types. The average expression level for the three individuals is 1.7, 2.0, and 1.7, respectively. For single cell data, not only we can study association between genotypes and cell-type-specific expressions, we can also correlate genotypes with cell type proportions. Through deconvolutions methods, the bulk samples may be deconvoluted to different cell types to allow cell-type-specific eQTL analysis with estimated cell type proportions from different individuals.

Purified Cells
Bulk Samples